Korean Compound Noun Decomposition Using Syllabic Information Only

نویسندگان

  • Seong-Bae Park
  • Jeong Ho Chang
  • Byoung-Tak Zhang
چکیده

The compound nouns are freely composed in Korean, since it is possible to concatenate independent nouns without a postposition. Therefore, the systems that handle compound nouns such as machine translation and information retrieval have to decompose them into single nouns for the further correct analysis of texts. This paper proposes the GECORAM (GEneralized COmbination of Rule-based learning And Memory-based learning) algorithm for Korean compound noun decomposition using only syllabic information. The merit of rule-based learning algorithms is high comprehensibility, but they shows low performance in many application tasks. To tackle this problem, GECORAM combines the rule-based learning and memory-based learning. According to the experimental results, GECORAM shows higher accuracy than rule-based learning or memory-based learning alone.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Study of Query Optimization for Korean Compound Nouns

Compound noun is one of phenomena of Korean language that information retrieval model of English-speaking community center is difficult to deal as indexing word that show most frequently in Korean. Compound noun consists of noun more than one and form of various kinds. It had been thought as big problem of index processing and searches it. Specially, compound noun analysis is difficult and comp...

متن کامل

Korean Compound Noun Term Analysis Based on a Chart Parsing Technique

Unlike compound noun terms in English and French, where words are separated by white space, Korean compound noun terms are not separated by white space. In addition, some compound noun terms in the real world result from a spacing error. Thus the analysis of compound noun terms is a difficult task in Korean NLP. Systems based on probabilistic and statistical information extracted from a corpus ...

متن کامل

Segmentation of Compound Nouns using Composite Mutual Information

In Korean, a compound noun may be freely formed with or without spaces between simple nouns. The exible word formation rule of Korean raises a serious problem in processing compound nouns with computers, in particular, in searching a dictionary with the compound noun as a search key. This paper describes a corpus-based method for segmenting a compound noun into simple nouns. Segmentation is per...

متن کامل

A Multi-phase Semi-supersense Tagging of Korean Unknown Nouns

Supersense tagging is a problem of finding a corresponding semantic super tag (eg. Phenomenon, Act) based on syntactic information and annotated corpora. However, we employ semantic information rather than syntactic one and annotated corpora, because Korean language has relatively flexible syntactic structure and is lack of annotated corpora. To construct the automatic sense tagging system for ...

متن کامل

Compound Noun Segmentation Based on Lexical Data Extracted from Corpus

Compound noun analysis is one of the crucial problems in Korean language processing because a series of nouns in Korean may appear without white space in real texts, which makes it difficult to identify the morphological constituents. This paper presents an effective method of Korean compound noun segmen-tation based on lexical data extracted from corpus. The segmentation is done by two steps: ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004